-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup libcudf strings regex classes #10573
Cleanup libcudf strings regex classes #10573
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10573 +/- ##
================================================
+ Coverage 86.33% 86.36% +0.02%
================================================
Files 140 140
Lines 22289 22289
================================================
+ Hits 19244 19249 +5
+ Misses 3045 3040 -5
Continue to review full report at Codecov.
|
@gpucibot merge |
All libcudf strings regex calls will use global device memory for state data when evaluating regex on strings. Previously, separate templated kernels were used to store state data in fixed size stack memory depending on the number of instructions resolved from the provided regex pattern. This required the CUDA driver to allocate a large amount of device memory for when launching the kernel. This memory is managed by the launcher in the driver and so not under control of RMM. This has been changed to use a memory-resource allocated global device memory to hold and manage the state data per string per instruction. This is an internal change only and results in no behavior changes. Overall, the performance based on the current benchmarks has not changed though much more memory may be required to execute any of the regex APIs depending on the number of instructions in the pattern and the total number of strings in the column. Every effort has been made to not reduce performance from the stack-based approach. Additional optimizations here include copying the `reprog_device` class data to shared-memory (when it fits). Further optimizations are expected in later PRs as well. Overall, the compile time of the files that use regex is also faster since only a single kernel is generated instead of 4 in the templated, stack-based implementation. This PR is dependent on PR #10573. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Jake Hemstad (https://github.com/jrhemstad) URL: #10600
Refactors some of the internal libcudf regex classes used for executing regex on strings. This is the first part of some changes to reduce kernel memory launch size for the regex code. A follow on PR will change the stack-based state management to a device memory approach. The changes here are isolated to help ease the review process in the next PR. Mostly code has been moved or refactored along with general cleanup like adding consts and removing some unnecessary pass-by-reference/pointer.
None of the calling routines currently require changes and no behavior has changed.